Vectorization and Parallelization of Loops in C/C++ Code

نویسندگان

  • Xuejun Liang
  • Ali A. Humos
چکیده

Modern computer processors can support parallel execution of a program by using their multicores. Computers can also support vector operations by using their extended SIMD instructions. To make a computer program run faster, the time-consuming loop computations in the program can often be parallelized and vectorized to utilize the capacity of multicores and extended SIMD instructions. In this paper, the vector multiplication and the matrix multiplication will be used as examples to illustrate how to perform parallelization and vectorization of loops in a C/C++ program when using Microsoft Visual C++ compiler or GNU gcc (g++) compiler. An overview of the Intel

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Exploitation of Parallelism on Pentium III and Pentium 4 Processor-Based Systems

Systems based on the Pentium III and Pentium 4 processors enable the exploitation of parallelism at a fineand medium-grained level. Dualand quad-processor systems, for example, enable the exploitation of mediumgrained parallelism by using multithreaded code that takes advantage of multiple control and arithmetic logic units. Streaming Single-Instruction-Multiple-Data (SIMD) extensions, on the o...

متن کامل

Vectorization and Parallelization of the Adaptive Mesh Refinement N -body Code

In this paper, we describe our vectorized and parallelized adaptive mesh refinement (AMR)N -body code with shared time steps, and report its performance on a Fujitsu VPP5000 vector-parallel supercomputer. Our AMR N -body code puts hierarchical meshes recursively where higher resolution is required and the time step of all particles are the same. The parts which are the most difficult to vectori...

متن کامل

Lecture Notes on Cache Iteration & Data Dependencies 15 - 411 : Compiler Design

Cache optimization can have a huge impact on program execution speed. It can accelerate by a factor 2 to 5 for numerical programs. Loops are the parts of the program that are generally executed most often. That is why cache optimization usually focuses exclusively on handling loops. Especially for loops that execute very often, optimizing small chunks of source code can have a fairly significan...

متن کامل

A general compilation algorithm to parallelize and optimize counted loops with dynamic data-dependent bounds

We study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamically computed, data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with inductively defined termination conditions: for example, the substitution of closed forms for induction variables remains applicable...

متن کامل

McFLAT: A Profile-Based Framework for MATLAB Loop Analysis and Transformations

Parallelization and optimization of the MATLAB programming language presents several challenges due to the dynamic nature of MATLAB. Since MATLAB does not have static type declarations, neither the shape and size of arrays, nor the loop bounds are known at compile-time. This means that many standard array dependence tests and associated transformations cannot be applied straight-forwardly. On t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017